Skip to content

Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation#554

Merged
MaxGhenis merged 21 commits intomainfrom
calibrate-retirement-contributions
Mar 4, 2026
Merged

Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation#554
MaxGhenis merged 21 commits intomainfrom
calibrate-retirement-contributions

Conversation

@PavelMakarchuk
Copy link
Collaborator

@PavelMakarchuk PavelMakarchuk commented Feb 25, 2026

Summary

Calibrates retirement contribution variables in the CPS/Extended CPS pipeline with three major changes:

  • Proportional RETCB_VAL allocation: Splits CPS retirement contributions into traditional/Roth 401(k) and traditional/Roth IRA using admin-data shares from imputation_parameters.yaml (dc_share=0.908, ira_share=0.092, roth_dc_share=0.15, trad_ira_share=0.392)
  • QRF imputation for PUF clone half: Trains a Quantile Random Forest on the CPS half to predict retirement contributions onto the PUF clone half, ensuring income-contribution consistency (e.g., $0-wage PUF clones get $0 in 401(k) contributions)
  • Calibration targets: Adds national calibration targets for traditional 401(k) ($482.7B), Roth 401(k) ($85.2B), traditional IRA ($13.2B), Roth IRA ($35.0B), and SE pension ($29.5B)

Additional changes

  • SE pension cap for dual-income filers: Caps SE pension at min(25% of SE income, IRS dollar limit) so dual-income filers retain a remainder for 401(k)/IRA. Previously all of RETCB_VAL went to SE pension for anyone with SE income.
  • IRA gate fix: IRA pool now available to anyone with earned income (wages or SE), not just wage earners. SE-only filers can now receive IRA contributions.
  • Dataset integrity tests: Ensures formula variables are not stored in dataset files, preventing stale-value bugs
  • policyengine-us upgrade: Updates to v1.587.0
  • conftest.py fix: Fixes microimpute mocking to use try/except ImportError instead of checking sys.modules, preventing MagicMock from replacing the real QRF during pytest
  • Retirement limits consolidated: Moved contribution limits from duplicated dicts in cps.py / puf_impute.py to single source in imputation_parameters.yaml
  • Exception logging improved: _qrf_ss_shares and _impute_retirement_contributions now log full tracebacks on failure
  • Redundant conftest removed: Inner test_calibration/conftest.py mock removed (root conftest propagates)
  • Validation script de-duplicated: Imports targets from loss.py and limits from YAML instead of hardcoding

Test plan

  • 35 unit tests for retirement QRF imputation (test_retirement_imputation.py)
  • Dataset integrity tests (test_no_formula_variables_stored.py)
  • CI passes all existing tests (conftest fix resolves test_cps_has_auto_loan_interest / test_cps_has_net_worth failures)
  • Post-build validation script (validation/validate_retirement_imputation.py) confirms aggregates match targets

🤖 Generated with Claude Code

Copy link
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just please update this one bullet point in the PR body (if this is indeed correct):

Says:

  • Remove roth_ira_contributions target ($39B) — structurally $0 due to CPS allocation bug where traditional IRA always exhausts the IRA limit first

But:
code actually keeps roth_ira_contributions and updates it from $39B to $35.0B in both loss.py and etl_national_targets.py (which it can do be

@PavelMakarchuk PavelMakarchuk changed the title Fix and add retirement contribution calibration targets Calibrate retirement contributions: targets, SS reconciliation, and QRF imputation Feb 26, 2026
@PavelMakarchuk PavelMakarchuk marked this pull request as draft February 26, 2026 23:08
@PavelMakarchuk PavelMakarchuk marked this pull request as ready for review March 3, 2026 10:45
@PavelMakarchuk PavelMakarchuk requested a review from baogorek March 3, 2026 10:51
@PavelMakarchuk
Copy link
Collaborator Author

@baogorek rewrote the PR fairly substantially since your last review. My first attempt at this so please be as nit-picky as needed.

MaxGhenis added a commit that referenced this pull request Mar 3, 2026
Replace hard-coded RETIREMENT_LIMITS dict with a shared utility that
reads from policyengine-us's parameter tree at runtime. This ensures
limits stay in sync as policyengine-us is updated, and eliminates a
maintenance risk when the same dict is duplicated in puf_impute.py
(PR #554).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PavelMakarchuk In full transparency, my review was focused on how disruptive this PR would be to #538, which fortunately is not at all. Looking at the printout of items I got from Claude, it looks like there are some duplication issues that are not that serious.

Point 5 about the subsampling is interesting. I believe this is where @juaristi22 found that 40,000 was where diminishing returns started to kick in. But I'm reading that this is new code, and if you found that it works well, it's not a blocker.

I'm going to go ahead and approve, but please consider these points before you merge:

  1. Retirement limits duplicated between cps.py and puf_impute.py — _get_retirement_limits() is a copy-paste of the
    limits dict in cps.py with the comment "duplicated here to avoid circular imports." These should be extracted to a
    shared location — imputation_parameters.yaml already holds the allocation shares and would be a natural home for the
    limits too. Two copies will inevitably drift.
  2. _qrf_ss_shares swallows exceptions silently — The except Exception block returns None without logging the
    exception or traceback. _impute_retirement_contributions at least logs %s, e, but neither logs exc_info=True. When
    these fail in production you'll have no idea why.
  3. TestLimitsMatchCps doesn't actually cross-check against cps.py — Despite the name, this test just re-calls
    _get_retirement_limits() with hardcoded values. A real cross-check would import the CPS limits and assert equality —
    which would also catch the duplication problem from point 1.
  4. Duplicate conftest.py — Both root conftest.py and policyengine_us_data/tests/test_calibration/conftest.py mock
    microimpute identically. Root conftest propagates downward in pytest, so the inner one is redundant. Note: the
    calibration test conftest on my branch has different fixtures (db_uri, dataset_path), so we'll have a merge conflict
    — easy to resolve by combining both into one file, but worth being aware of.
  5. QRF training subsample of 5000 is undocumented — CPS has ~200K person records. Why 5000? If it's for speed, a
    comment explaining the tradeoff would help. If arbitrary, a larger sample would improve predictions.
  6. SE + wages people get $0 in 401(k)/IRA — In the proportional RETCB_VAL allocation, if someone has both SE income
    and wages, the entire amount goes to self_employed_pension_contributions first, leaving remaining = 0. Many people
    have W-2 jobs plus side gigs. Worth at least a comment acknowledging this as a known limitation.
  7. Validation script hardcodes targets — validation/validate_retirement_imputation.py re-declares all target values
    instead of importing NATIONAL_CALIBRATION_TARGETS from loss.py. These will drift.

Worth discussing:

  1. SE + wages people get $0 in 401(k)/IRA — In the proportional RETCB_VAL allocation, if someone has both SE income
    and wages, the entire amount goes to self_employed_pension_contributions first, leaving remaining = 0. Many people
    have W-2 jobs plus side gigs. Worth at least a comment acknowledging this as a known limitation.
  2. Validation script hardcodes targets — validation/validate_retirement_imputation.py re-declares all target values
    instead of importing NATIONAL_CALIBRATION_TARGETS from loss.py. These will drift.

Looks good:

  • _drop_formula_variables is a solid cleanup — my stacked dataset builder already follows this pattern (filtering to
    input_variables), so making ExtendedCPS generation consistent is welcome.
  • Calibration target sourcing is well-documented.
  • The conftest try/except ImportError fix for the microimpute mock is the right approach.

MaxGhenis added a commit that referenced this pull request Mar 4, 2026
* Read retirement limits from policyengine-us parameters

Replace hard-coded RETIREMENT_LIMITS dict with a shared utility that
reads from policyengine-us's parameter tree at runtime. This ensures
limits stay in sync as policyengine-us is updated, and eliminates a
maintenance risk when the same dict is duplicated in puf_impute.py
(PR #554).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fixup! Read retirement limits from policyengine-us parameters

* Revert test formatting to match CI Black version

The fixup reformatted SQL strings in test_database_build.py using
a newer Black version that disagrees with CI's lgeiger/black-action.
Revert to the original formatting that CI accepts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add tests for retirement limits utility

Verifies get_retirement_limits returns correct IRS contribution
limits for 2020, 2023, and 2025 by comparing against known values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Handle both old and new policyengine-us catch-up parameter layouts

The SECURE 2.0 update (merged to pe-us main, not yet released)
renames children["401k"] to children["k401"] and changes it from
a simple int to an age-bracketed scale. Handle both layouts so the
code works with the current release and future releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use k401 parameter directly, bump policyengine-us>=1.572.5

Drop dual-path handling in favor of the current parameter layout
(k401 scale with age brackets from SECURE 2.0). Bump the minimum
policyengine-us version to 1.572.5 which introduced this parameter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@MaxGhenis
Copy link
Contributor

I've merged #566 which extracts the retirement limits into a shared utility (utils/retirement_limits.py) that reads from policyengine-us parameters at runtime. When you rebase onto main, you can replace the duplicated limits dict in puf_impute.py with an import from that utility — addresses Ben's point 1 cleanly.

After rebasing, this should be good to merge once CI passes. Nice work on the QRF imputation approach.

@MaxGhenis
Copy link
Contributor

Update: tried to rebase this for you but the branch is based on a very old commit (217 commits behind main). The cherry-pick/rebase hits conflicts because several files have been restructured since then (puf_impute.py, etl_national_targets.py, cps.py, loss.py, etc.).

You'll need to rebase locally — you know the retirement code best so the conflict resolution will be straightforward for you. After rebasing, you can also import from policyengine_us_data.utils.retirement_limits (from #566) instead of duplicating the limits dict.

PavelMakarchuk and others added 10 commits March 4, 2026 14:44
- Correct traditional_ira_contributions: $25B → $13.2B (SOI 1304
  Table 1.4 actual deduction, not total contributions)
- Add traditional_401k_contributions: $567.9B (BEA/FRED employee
  DC contributions)
- Add self_employed_pension_contribution_ald: $29.5B (SOI 1304
  Table 1.4 Keogh plan deduction)
- Remove roth_ira_contributions: structurally $0 due to CPS
  allocation bug (#553)
- Update both loss.py and etl_national_targets.py

Closes #553

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sequential waterfall with proportional allocation based on
administrative data. The old waterfall gave 401(k) first priority,
consuming all of RETCB_VAL and leaving IRA contributions at $0 for
every record. The Roth IRA allocation was also mathematically
guaranteed to produce $0.

The new approach splits RETCB_VAL proportionally:
- DC vs IRA: 90.8% / 9.2% (BEA/FRED vs IRS SOI)
- Within DC: 85% traditional / 15% Roth (Vanguard/PSCA)
- Within IRA: 39.2% traditional / 60.8% Roth (IRS SOI Tables 5 & 6)

All fractions are stored in imputation_parameters.yaml with sources.
Contribution limits are still enforced.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Split 401(k) target: $567.9B total employee DC deferrals (BEA/FRED)
  into traditional $482.7B (85%) and Roth $85.2B (15%) using Vanguard
  How America Saves 2024 dollar share estimate
- Add roth_ira_contributions target: $35.0B from IRS SOI Accumulation
  Tables 5 & 6 (TY 2022) — direct administrative source
- Update etl_national_targets.py in parallel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When PUF imputation replaces social_security values, the sub-components
(retirement, disability, survivors, dependents) were left unchanged,
creating a mismatch. This caused a base-year discontinuity where projected
years had ~3x more SS recipients than the base year, producing artificial
9-point Gini swings.

The new reconcile_ss_subcomponents() function rescales sub-components
proportionally after PUF imputation. New recipients (CPS had zero SS
but PUF imputed positive) default to retirement.

Fixes #551

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New PUF recipients (CPS had zero SS) now get assigned based on age:
>= 62 -> retirement, < 62 -> disability. Matches the CPS fallback
logic. Falls back to retirement if age is unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of the simple age >= 62 heuristic, train a QRF on CPS records
(where the reason-code split is known) to predict shares for new PUF
recipients. Uses age, gender, marital status, and other demographics.
Falls back to age heuristic when microimpute is unavailable or
training data is insufficient (< 100 records).

14 tests covering:
- Proportional rescaling (existing recipients)
- Age heuristic fallback (4 tests)
- QRF share prediction (4 tests including sum-to-one and
  elderly-predicted-as-retirement)
- Edge cases (zero imputation, missing subs, no SS)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CPS-PUF link is statistical (not identity-based), so the paired CPS
record's sub-component split is just one noisy draw. A QRF trained
on all CPS SS recipients gives a better expected prediction. Also
removes unnecessary try/except ImportError guard for microimpute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Variables with formulas in policyengine-us are recomputed by the
simulation engine, so storing them wastes space and can mislead
validation. This removes 9 such variables from the extended CPS
output (saving ~15MB).

Also adds tests verifying:
- No formula variables are stored (except person_id)
- Stored input values match what the simulation computes
- SS sub-components sum to total social_security per person

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MaxGhenis and others added 10 commits March 4, 2026 14:46
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update _drop_formula_variables and tests to catch variables that use
`adds` or `subtracts` (e.g. social_security), not just explicit
formulas. These are also recomputed by the simulation engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PUF clones previously copied CPS retirement contributions blindly,
so a record with $0 wages could have $50k in 401(k) contributions.

Train a QRF on CPS data (which has realistic income-to-contribution
relationships) and predict onto the PUF half using PUF-imputed income.
Post-prediction constraints enforce contribution caps, zero-out rules
for records with no wages/SE income, and non-negativity.

- Remove traditional_ira_contributions from IMPUTED_VARIABLES
- Add CPS_RETIREMENT_VARIABLES, RETIREMENT_PREDICTORS constants
- Add _get_retirement_limits() with year-specific 401k/IRA caps
- Add _impute_retirement_contributions() following _impute_weeks_unemployed pattern
- Integrate into puf_clone_dataset() variable routing loop
- Add 34 unit tests covering constraints, routing, and limits

Closes #561

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add income predictors (interest, dividends, pension, SS) to the
retirement contribution QRF, matching issue #561's recommendation.
Split RETIREMENT_PREDICTORS into demographic and income sublists so
the test side correctly sources income from PUF imputations.

Also add validation/validate_retirement_imputation.py for post-build
constraint checking and aggregate comparison against calibration targets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap QRF fit/predict in try/except so a crash returns zeros instead
of blowing up the entire puf_clone_dataset build.  Document the
pre_tax_contributions inconsistency (OVERRIDDEN vs CPS-trained
sub-components) for future reconciliation.

Tests: add test_qrf_failure_returns_zeros and
test_training_data_failure_returns_zeros (50 total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The conftest.py files used `if "microimpute" not in sys.modules` to
decide whether to install a MagicMock.  On CI (Python 3.13, where
microimpute is fully installed), this condition was True at pytest
startup because no code had imported microimpute yet.  The mock
replaced the real package, so tests that triggered CPS generation
(add_rent → QRF imputation) silently got a MagicMock whose
__len__ returns 0, causing the "0 input values" ValueError.

Fix: use try/import instead of checking sys.modules, so the mock
is only installed when microimpute genuinely cannot be imported.

Also restores policyengine-us to 1.587.0 (the revert in 2566ac1
was a misdiagnosis — the conftest mock was the real root cause).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move retirement contribution limits to imputation_parameters.yaml
  (single source of truth for cps.py and puf_impute.py)
- Add exc_info=True to QRF exception handlers for full tracebacks
- Replace TestLimitsMatchCps with TestLimitsMatchYaml that actually
  cross-checks against the shared YAML source
- Remove redundant microimpute mock from test_calibration/conftest.py
  (root conftest already propagates)
- Document QRF training subsample of 5000 rationale
- Import targets/limits in validation script from loss.py and YAML
  instead of hardcoding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, filers with SE income had ALL of RETCB_VAL allocated to
self_employed_pension_contributions, leaving $0 for 401(k)/IRA even
when they also had wages. SE-only filers also got $0 in IRA despite
being eligible.

Changes:
- Cap SE pension at min(25% of SE income, IRS dollar limit) so
  dual-income filers retain a remainder for 401(k) and IRA
- Gate IRA pool on has_earned_income (wages OR SE) instead of
  has_wages only, so SE-only filers can receive IRA contributions
- IRA pool = remaining - dc_pool (absorbs all non-DC remainder)
- Add SE pension rate and dollar limits to imputation_parameters.yaml
  (2020-2025, sourced from IRS one-participant 401k rules)
- Apply matching SE pension cap in PUF QRF constraints
- Add test_se_pension_capped_at_rate_times_income

No change to calibration targets (independently sourced from
BEA/FRED and IRS SOI). Pre-calibration aggregates should shift
closer to targets since dual-income filers now contribute to all
account types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace _get_retirement_limits() YAML loader with wrapper that
  merges policyengine-us params (401k/IRA) + YAML SE pension params
- Remove retirement_contribution_limits block from YAML (now sourced
  from policyengine-us parameter tree via utils/retirement_limits.py)
- Update validation script to use get_retirement_limits() directly
- Update tests for new structure (no more year clamping for 401k/IRA)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@PavelMakarchuk PavelMakarchuk force-pushed the calibrate-retirement-contributions branch from 942de14 to a1446c4 Compare March 4, 2026 13:53
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@PavelMakarchuk
Copy link
Collaborator Author

@baogorek Thanks for the thorough review — all 7 points have been addressed:

  1. Retirement limits consolidated — Rebased on main which now includes PR Read retirement limits from policyengine-us parameters #566's shared get_retirement_limits() utility (reads 401k/IRA limits from policyengine-us parameter tree). SE pension params live in imputation_parameters.yaml. No more duplication.

  2. Exception logging — Added exc_info=True to both _qrf_ss_shares and _impute_retirement_contributions exception handlers.

  3. Cross-checking tests — Replaced TestLimitsMatchCps with TestLimitsStructure that validates keys and cross-checks against the policyengine-us parameter tree.

  4. Redundant conftest removed — Inner test_calibration/conftest.py mock deleted; root conftest propagates.

  5. Subsample documented — Added comment explaining the 5,000 record tradeoff (speed vs accuracy, diminishing returns above ~5K for 5 targets with 8 predictors).

  6. SE + wages fix — This was the big one. SE pension is now capped at min(25% × SE_income, IRS dollar limit) before the proportional DC/IRA split, so dual-income filers keep the remainder for 401(k)/IRA. Also fixed the IRA gate from has_wages to has_earned_income so SE-only filers can receive IRA contributions. Applied in both cps.py (CPS half) and puf_impute.py (QRF constraints on PUF half).

  7. Validation script de-hardcoded — Now imports targets from loss.py and limits from the shared utility.

CI: all quick checks pass (lint, lock freshness, smoke test, changelog, fork check). Full Modal build/test still running.

@PavelMakarchuk PavelMakarchuk requested a review from baogorek March 4, 2026 15:00
Copy link
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough and well-sourced PR. Key improvements:

  • Proportional RETCB_VAL split (replacing waterfall that starved IRA contributions) using BEA/FRED + SOI admin data shares
  • QRF retirement imputation for PUF half with proper IRS limit capping and income eligibility gating
  • SS sub-component reconciliation (QRF + age heuristic fallback) ensuring subs sum to imputed total
  • Formula variable stripping from stored dataset to prevent stale values misleading validation
  • Calibration targets corrected with proper SOI/BEA sources (traditional IRA $25B -> $13.2B, 401(k) split added)
  • Comprehensive tests (1200+ lines) covering constants, limits, QRF mock behavior, routing, SS reconciliation, and dataset integrity

Already uses the shared get_retirement_limits utility from #566.

@MaxGhenis MaxGhenis merged commit 7025f6a into main Mar 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants